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TEST FOR BANDEDNESS OF HIGH-DIMENSIONAL 
COVARIANCE MATRICES AND BANDWIDTH ESTIMATION 

By Yumou Qiu and Song Xi Chen 1 

Iowa State University, and Peking University and Iowa State University 

Motivated by the latest effort to empfoy banded matrices to esti- 
mate a high-dimensionai covariance E, we propose a test for E being 
banded with possible diverging bandwidth. The test is adaptive to the 
"large p, small n" situations without assuming a specific parametric 
distribution for the data. We also formulate a consistent estimator 
for the bandwidth of a banded high-dimensional covariance matrix. 
The properties of the test and the bandwidth estimator are investi- 
gated by theoretical evaluations and simulation studies, as well as an 
empirical analysis on a protein mass spectroscopy data. 

1. Introduction. High-dimensional data are increasingly collected in sta- 
tistical applications, which include biological experiments, climate and en- 
vironmental studies, financial observations and others. The high dimension- 
ality calls for new statistical methodologies which are adaptive to this new 
feature of the modern statistical data. The covariance matrix £ = Var(Y) 
for a p-dimensional random vector X is an important measure on the de- 
pendence among components of X . The sample covariance S n , constructed 
based on n independent copies of X , is a key ingredient in many statisti- 
cal procedures in the conventional multivariate analysis [Anderson (2003) 
and Muirhead (1982)] where the data dimension p is regarded as fixed. The 
widespread use of S n in the conventional multivariate procedures is largely 
due to S n being a consistent estimator of £ when p is fixed or small rel- 
ative to the sample size n. However, for high-dimensional data such that 
p/n — > c G (0, oo], it is known that the eigenvalues of the sample covariance 
matrix are no longer consistent to their population counterpart, as demon- 
strated in Bai and Yin (1993), Bai, Silverstein and Yin (1988), Johnstone 
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(2001) and El Karoui (2011). These mean that the sample covariance S n is 
no longer consistent to £, which hinders applications of many conventional 
multivariate statistical procedures for high-dimensional data. 

To overcome the problem with the sample covariance, constructing covari- 
ance estimators via banding or tapering the sample covariance matrix has 
been a focus in high-dimensional covariance estimation. Wu and Pourahmadi 
(2003) considered banding the Cholesky factor matrix via the kernel smooth- 
ing estimation, which was further developed by Rothman, Levina and Zhu 
(2010). Bickel and Levina (2008a) proposed banding the sample covariance 
matrix directly for estimating £ and banding the Cholesky factor matrix 
for estimating E" 1 . They demonstrated that both estimators are consistent 
to £ and E , respectively, for some "bandable" classes of covariance matri- 
ces. Cai, Zhang and Zhou (2010) proposed a tapering estimator, which can 
be viewed as a soft banding on the sample covariance, which was designed 
to improve the banding estimator of Bickel and Levina. They demonstrated 
that the tapering estimator attains the optimal minimax rates of conver- 
gence for estimating the covariance matrix. Wagaman and Levina (2009) 
developed a method for discovering meaningful orderings of variables such 
that banding and tapering can be applied. Both the banding and tapering 
methods for covariance estimation are well connected to the regularization 
method considered in Huang et al. (2006), Bickel and Levina (2008b), Fan, 
Fan and Lv (2008) and Rothman, Levina and Zhu (2009). 

Motivated by the promising results regarding banding and tapering the 
sample covariance, we develop in this paper a test procedure on the hypoth- 
esis that £ is banded. The rationale for developing such a test is to check 
a £ in the so-called "bandable" class outlined in Bickel and Levina (2008a) 
such that the banding or the tapering estimators are consistent. There is 
yet a practical guideline to confirm or otherwise if a £ is within the "band- 
able" class so that the banding and tapering can be applied. Hence, a direct 
testing on £ being banded provides a path of advance to gain knowledge 
on the structure of the covariance. If the banded hypothesis is confirmed by 
the test, the banding and tapering estimators may be employed. 

Diagonal matrices are the simplest among banded matrices. Given the 
importance commanded by covariance matrices in high-dimensional mul- 
tivariate analysis, directly testing for £ being diagonal and the so-called 
sphericity hypothesis in classical multivariate analysis [John (1972) and Na- 
gao (1973)], have been considered in a set of studies including Ledoit and 
Wolf (2002), Jiang (2004), Schott (2005), Chen, Zhang and Zhong (2010) and 
Cai and Jiang (2011) under high dimensionality. For normally distributed 
data, Jiang (2004) proposed testing for diagonal £ by considering a co- 
herence statistic L n = m&xi<i < j< p \f>ij |, where pij is the sample correlation 
coefficient between the ith and the jth components of the random vector X. 
Jiang established the asymptotic distribution of L n under the null diagonal 
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hypothesis, which was used to derive a sphericity test. As L n is an extreme 
value type, its convergence to its limiting distribution can be slow. Liu, Lin 
and Shao (2008) proposed a modification which is shown to be able to speed 
up the convergence. Cai and Jiang (2011) extended the test of Jiang (2004) 
for the handedness of E, which is shown to be applicable for the "large p, 
small n" situations such that log(p) = o(n 1 / 3 ). 

In this paper, we propose a nonparametric test for E being banded with- 
out assuming a parametric distribution for the high-dimensional data. The 
test is formulated to allow the dimension to be much larger than the sam- 
ple size. Based on the test statistic for handedness, we propose a consistent 
estimator for the bandwidth of a banded high-dimensional covariance. The 
properties of the test and bandwidth estimator are demonstrated by the- 
oretical evaluation, simulation studies and empirical analysis on a protein 
mass spectroscopy data for prostate cancer. 

The paper is organized as follows. Section 2 introduces the hypothe- 
ses, the assumptions and the test statistic. In Section 3, we present the 
properties of the test statistic and the test, and evaluate its power prop- 
erties. Estimation of the bandwidth is considered in Section 4. Section 5 
reports simulation results. An empirical analysis on a prostate cancer spec- 
troscopy data is outlined in Section 6. All technical details are relegated to 
the Appendix. 

2. Preliminary. Let X\, X2, ■ . ■ , X n be independent and identically dis- 
tributed p-dimensional random vectors with mean ix and covariance matrix 
E = ((Tij)p X p. A matrix A = (aij) pxp is said to be banded if there exists 
an integer k £ {0, . . . ,p — 1} such that aij = for \i — j\ > k. The small- 
est k such that A is banded is called the bandwidth of A. Banding of A at 
a bandwidth k refers to setting a™ = for all \i — j| > k. 

Let .Bfc(£) = (aijl{\i — j\ < k}) pxp be a banded version of E with band- 
width k. Specifically, Bq(T,) is the diagonal version of E. We intend to test 

(2.1) tf M :E = £ fe (E) vs. # M :E^B fe (E) 

for k = o{p l l 4 ). Hence, the bandwidth k of E to be tested can be either 
fixed or diverging to infinite as long as it is slower than p l / A . Allowing 
divergent bandwidth in the hypothesis is an improvement over the sphericity 
test as considered in Ledoit and Wolf (2002) and Chen, Zhang and Zhong 
(2010). It also connects to the latest works on high-dimensional covariance 
estimation with banded or tapered versions of the sample covariance as in 
Bickel and Levina (2008a) and Cai, Zhang and Zhou (2010). In particular, 
Cai, Zhang and Zhou (2010) showed that the optimal minimax rates for 
the bandwidth of the banded covariance estimator of Bickel and Levina 
(2008a) is k = 0[{n/log(p)} 1 '( 2a+1 '], and that for the tapering estimator 
is k = C^n 1 ^ 20 " 1-1 )), where a is an index value for a "bandable" class of 
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covariances 



ti{e ,a, C) = I £ : max V] \<Ck a for all k > 0, 



(2.2) 



|i-j|>& 



and < e < A min (S) < A max (S) < e 1 j. 



The range of bandwidths A; = c^p 1 / 4 ) in the hypothesis (2.1) should cover 
the above optimal rates when p> n. 

We note that Hk t o is valid if and only if J2\i-j\>k p a ij = 0> an d the latter 
implies that tr{£ — ^(S)} 2 = 0. A strategy is to construct an unbiased 
estimator of tr{£ — ^(E)} 2 and use it to develop the test statistic. Let 
D q := Ylf=i a ii+q De t ne sum of squares of the gth sub-diagonal of S. Then, 

tr{S-i? fc (S)} 2 = 2n= zk+1 D q . It can be checked that an unbiased estimator 
of Dq is 

p-q ( i * 1 * 

Dnq =y^A -p2^Z( X il X U+q)( X jl X jl+q) ~ 2 7^T ^ X il X kl+q( X jl X jl+q) 
1=1 { r ™ i,j r ri iJ:fe 



1 * 1 

p4 ^ X U X jl+q X kl X ml+q\, 
n i,j,k,m ) 



where ^* denotes summation over mutually different subscripts shown and 
= n!/(n — b)\. The reason to sum over different indices is for easier ma- 
nipulations with the mean and variance of the final test statistic and to 
establish the asymptotic normality. The latter leads to a test procedure for 
the handedness. 

We consider the following statistic: 

p-i 

(2.3) W nk :=2 D nq . 

q=k+l 

As each D nq is invariant under the location shift, W n k is also location shift 
invariant. Hence, without loss of generality, we assume /z = E(X) = 0. 

To facilitate our analysis, as Bai and Saranadasa (1996) and Chen, Zhang 
and Zhong (2010), we assume a multivariate model for the high-dimensional 
data. 

Assumption 1. (i) X\, A 2 , . . . ,X n are independent and identically dis- 
tributed (i.i.d.) p-dimensional random vectors such that 

(2.4) Xi = TZi for i = l,2,...,ra, 

where V is a p x m constant matrix with m>p, TV = E, and Z\, . . . , Z n are 
i.i.d. m-dimensional random vectors such that E(Zi) = and Var(Zi) = I m . 
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(ii) Write Z\ = (zu, . . . , zi m ) T . Each z\i has uniformly bounded 8th mo- 
ment, and there exist finite constants A and uj such that for I = 1, . . . ,m, 
E(>4 ) = 3 + A, E(zf t ) = oj and for any integers i u > with Yl=i C = 8 

(2-5) E(.f 2 g...zJ) = E(^ i )E(^ 2 )...E(4) 

whenever i±,i2, ■ ■ ■ ,iq are distinct subscripts. 

The requirement of common third and fourth moments of Z\i is not es- 
sential and is purely for the sake of simpler notation. Our theory allows 
different third and fourth moments as long as they are uniformly bounded, 
which are actually assured by zu having uniformly bounded 8th moment. 

The asymptotic framework that regulates the sample size n, the dimen- 
sionality p and the covariance £ is the following. 

Assumption 2. As n ->• oo, p = p(n) ->• oo, n = 0(p) and tr(S 4 )/ 
tr 2 (S 2 ) = 0(p" 1 ). 

We note that n = 0(p) includes p>n, the "large p, small n" paradigm, 
but may not imply p = 0{n). Different from the usual approach of specifying 
an explicit growth rate of p with respect to n, Assumption 2 requires ratio 
of tr(E ) to tr 2 (S 2 ) shrinks at the rate of p^ 1 or smaller. The latter is 
stronger than tr(S 4 )/tr 2 (S 2 ) = o(l). It is needed due to possible diverging 
bandwidths. 

Let 

be the class of covariances satisfying the last part of Assumption 2. The class 
includes the "bandable" class il(eo,a,C) of Bickel and Levina (2008a) given 
in (2.2) for the banding estimation. To appreciate this, let Ai < A2 < • • • < A p 
be the eigenvalues of S. If the smallest and largest eigenvalues are bounded 
away from and 00 respectively, then 

Therefore, the "bandable" covariances are contained in U p . Now suppose 
that E has exactly m p zero eigenvalues and X mp +i being the smallest nonzero 
eigenvalue. Then 

tr(S 4 ) Xp 



tr 2 (S 2 )-(p-m p )A^ +1 - 

Thus, S is in U p as long as A p /A m , p +i is bounded and m p < cp for some 
c £ (0, 1) as p — > 00. The latter means that the class IA P is likely to contain the 
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class considered in Cai, Zhang and Zhou (2010), which allows the smallest 
eigenvalue to diminish to zero. It can be also checked that the following two 
covariances, 

Z = (v i v jP V-\ xp or Z = (a i a j p\i-M\j-i\<d)) pxp , 

are members of IA V if {o~f}f =1 are uniformly bounded from infinity and zero 
respectively. 

3. Main results. We first describe the basic properties of the statis- 
tic W nk defined in (2.3). Let 

4 

n 2 n 



n * 



(3.1) 

+ — Atr{r'(E - B t {E))T o r'(E - B t (E))T}, 
n 

where £1 o A = (wjjAjj) for two matrices f2 = (uiij) and A = (Ajj). 

Proposition 1. Under Assumptions 1 and 2, 

E(W nk ) = ti[{Z-B k (Z)} 2 } and Var(W nfc ) = v 2 nk + o{v 2 nk ). 

The proposition indicates that under H k o, 

E(W nk ) = and v nk = 2tr[LB fc (£)} 2 ]/n, 

and f 2 fc is the leading order variance of W nk . It can be shown that tr{£(£ — 
Sfc(S))} 2 < 4(ife + l) 2 tr(S 4 ). Since 

tr{r'(E - B fc (S))r o r'(E - B k {H))T} < tr{£(E - B fc (£))} 2 , 

A > -2 and tr(£ 4 )/tr 2 (£ 2 ) = O^" 1 ), we have 

(3.2) 4n" 2 tr 2 (S 2 ) < u 2 nk < C a np tr 2 (S 2 ) 

for a constant Co > 4 and a np = ?i -2 + A; 2 (np) _1 . We note that a np — > as 
?2 — y co since fe = o(p 1//4 ). In particular, if A; is fixed, a np = 0(n~ 2 ). 
The following theorem establishes the asymptotic normality of W nk . 

Theorem 1. Under Assumptions 1 and 2, and if k = o(jp 1 ^), 
W nk -tr[{X-B k (Z)} 2 ] d N . 1} 

In order to formulate a test procedure based on the asymptotic nor- 
mality, we need to estimate tr[{i?fc(S)} 2 ] since u nk = 2tr[{B k (T>)} 2 ]/n un- 
der H k fl. Let V nk := D n o + 2^^ =1 D nq be the estimator, whose consistency 
to tr[{i?fc(S)} 2 ] is implied in the following proposition. 
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Proposition 2. Under Assumptions 1 and 2, Var{V n fc/tr(Xl 2 )} = 0(a np ), 
where a np = n~ 2 + /c 2 (np) _1 . 

Since E(V nk ) = tr[{i?fc(£)} 2 ] and a np — > 0, Proposition 2 means that, un- 
der Hk t o, ^nfc/t r [{-Bfe(S)} 2 ] 4l as n-> oo. This together with Theorem 1 
indicates that under H k; o 

T nk =:n^^N(0A)- 

Vnk 

This leads to our choice of T nk as the test statistic and the proposed test 
of size a that rejects H k q if T nk > 2z a where z a is the upper a quantile of 
N(0,1). 

As Theorem 1 prescribes the asymptotic normality under both H k Q 
and H k i, it permits a power evaluation of the test. Let 

(3 . 3) ' Snt = «w-H{B t m\ 

which may be viewed as a signal to noise ratio for the testing problem. This 
is because tr[{£ — B k (T,)} 2 ] is the square of Frobenius norm of the difference 
between E and its fe-banded version, and v nk measures the level of noise in 
the statistic W nk . Then, the power of the test under H kl i ■ S ^ B k (E) is 

f3 nk = P{nW nk /V nk > 2z a \Z + B k (£)} 

D /^ nfc -tr(S 2 ) + tr[{ J B fc (S)} 2 ] 2z a V nk 

= " > nk 

\ v nk nv nk 

Since v nk > 2n _1 tr(S 2 ), then 2V nk j{nv nk ) < V nk / tr(T, 2 ) for n large. Hence 
asymptotically, 

R ^ p ^n fc -tr(S 2 )+tr[{^ fc (S)} 2 ] ^_ V nk \ 

(3.4) (3 nk > P ^ — > z a ^- } - 5 nk j . 

To gain more insight on the power, let r k = tr[{B k (T,)} 2 ]/ tr(£ 2 ). Clearly, 
r k < 1 and is monotone nondecreasing with respect to k. If £ is banded with 
bandwidth fco, then 

(3.5) r k < 1 for k <ko and r k = l for k > ho. 
From the bounds for v nk in (3.2), it follows that 

(3.6) (C a np )- l / 2 (l - r k ) < 5 nk < \n{\ - r k ), 

which indicates that a np ^ 2 (l — r k ) = 0{5 nk ). When k is fixed, a np = 0(n~ 2 ) 
and 5 nk ~ n(l — r k ), indicating that 8 nk is at the exact order of n(l — r k ). 

Theorem 2. Under Assumptions 1 and 2, H k> i and ifk = o(p 1//4 ), then: 

(i) liminf n /3 nfc > 1 - ®(z a - liming 8 nk ); 

—1/2 

(ii) if a np (1 — rfc) — > oo, then f3 n k — > 1 as n — > oo. 
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Theorem 2 indicates that the proposed test is consistent as long as the 

1/2 

speed of 1 — — > under H^ i is not faster than a np . The test will have 
nontrivial power as long as liminf n , 5 n k > 0. If n(l — r^) — > 0, the test will 
have no power beyond the significant level a. We note that this happens 
when Hk,o and Hk t i are extremely close to each other, so that 1 — decays 
to zero faster than n~ l . We are actually a little amazed by the fact that the 

— 1/2 

test is powerful as long as liminf n a np (1 — r^) > or equivalently (1 — r^j 

1/2 

does not shrink to zero faster than a np , despite the high dimensionality and 
a possible diverging bandwidth k. Theorem 2 and (3.6) together imply that 
if rfc does not vary much as p increases, the power of the test will be largely 
determined by n, as confirmed by our simulation study in Section 5. 

Our proposed test is targeted on the covariance matrix S. A test for the 
correlation matrix can be developed by modifying the test statistic by first 
standardizing each data dimension via its sample standard deviation. The 
theoretical justification would be quite involved, and would require extra 
effort. In addition to be invariant under the location shift, the test statistic 
is invariant if all the variables among the high-dimensional data vector are 
transformed by a common scale. However, the proposed test statistic is not 
invariant under variable-specific scale transformation. The above mentioned 
test for the correlation matrix would be invariant under variable-specific 
scale transformation. 

4. Bandwidth estimation. We propose in this section an estimator to the 
bandwidth of banded covariance S. Estimating the bandwidth of a banded 
covariance matrix is an important and practical issue, given the latest ad- 
vances on covariance estimation by banding [Bickel and Levina (2008a)] or 
tapering [Cai, Zhang and Zhou (2010)] sample covariance matrices. Indeed, 
finding an adequate bandwidth is a pre-requisite for applying either the 
banding or tapering estimators. 

The proposed estimator is motivated by the test procedure developed 
in the previous section. Let &o be the true bandwidth. As the proposed 
test is consistent as long as — > 1 not too fast, and the sample size is 
large enough (can still be much less than p), the proposed test would reject 
(not reject) H^ q for k less (larger) than k^. An immediate but rather naive 
strategy would be to use the smallest integer k such that Hk t o is not rejected 
as the bandwidth estimator. However, this strategy may be insufficient to 
counter "abnormal" samples which can produce larger (smaller) values of 
the statistic T n k := W n k/V n k consistently for a wide range of k values, when 
in fact Hkfi (Hk t i) is true. And yet these "abnormal" samples are expected 
within the normal range of variations. To make the estimator robust against 
these "abnormal" samples and not so much dependent on the significant 
level a, we consider an estimator based on the difference between successive 
statistics, d n k = T n k — T n k+i- 
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We assume the true bandwidth ko be either fixed or diverging as long as 
(4.1) fc (ra~ 1/2 +jT 1/4 ) ^0 asrwoo, 

which covers a quite wide range for the bandwidth. Note that 

~ W nk -E(W nk )u nk E(W nk ) 
J-nk — 77 r 



V n k V nk V nk 

For k < M, where M = o{p 1 ^ i ) is a pre-chosen sufficiently large integer, 
{W nk — E(W nk )}/is nk is stochastically bounded (Theorem 1) and from (3.2), 
we have 

- _ ( a l J p 2 r^tT[{B k ^)Y] \ tr[{i? fc (£)} 2 ] 

nk ~ p \ v nk ) +(rk l) v nk ■ 

Let b nk = V nk / tr[{B k (T,)}' 2 ] — 1. From Propositions 1 and 2, 

(4.2) E(6 nfc ) = and Var {b nk ) = 0(a np rf). 

Since £ = B ko (T,) is nonnegative definite, tr(£ 2 ) < (2k + 1) tr[{5 (S)} 2 ]. 
Hence, for any k, r k > (2ko + l) - . These imply that 

(4.3) f nk = O p {a l £k ) + (r" 1 - 1){1 + o p (l)}. 

1/2 

It can be checked that a np ko — > under (4.1), which makes the first term on 
the right of the above equation negligible relative to the second term. And 
the second term is quite indicative between k < fco an d k>ko, since r k = 1 
for k>kf). 

To amplify the second term when k < fco while not inflicting the first term 
on the right of (4.3) too much, we consider multiplying n s on T nk for a small 

positive 5 and let d °l = ?i s (T nk — T nk+ i). The proposed bandwidth estimator 
is 

(4.4) ks,e = mhi{k:\($l\<0} 

for a pair of tuning parameters 5 > and 9 > 0. The following theorem gives 
the consistency of the bandwidth estimator for both fixed or diverging A;o- 

Theorem 3. Under Assumptions 1 and 2, if liminf n {inf/% < /% (rfc_|_i — 
r k)} > 0, then for any 8 > 0, kgfi — ko A under either of the two settings: 
(i) for any 5 € (0, 1) if ko is bounded; (ii) for any S £ (0, 1/2] if ko is diverging 
but satisfies (4-1), an d {o~u}f =1 are uniformly bounded away from and oo. 

We would like to remark that the multiplier n s in d^ 's formation leads 
to 9 being "free ranged" as long as 9 > 0. If such multiplication is not ad- 
ministrated, namely by setting 5 = 0, the range of 9 needs to be restricted 
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(a) Modified statistics 



(b) First order differences 
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Fig. 1. Box-plots of the modified statistics n T n k and their first order differences of the 
simulated data. The dashed line in the right panel is 8 = 0.06. The true bandwidth is 5. 



properly to ensure convergence. The requirement of liminf n {inffc < fe (r/ c+ i — 
r k)} > is to avoid situations where S has segments of zero sub-diagonals 
followed by nonzero sub-diagonals when one moves away from the main 
diagonal. Our estimator can be modified to suit such situations. However, 
we would not elaborate here for the sake of simplicity in the presentation. 
Attaining the consistency of ks t e with diverging ko requires a smaller 6 value. 

To better understand the theorem and the bandwidth estimator, we con- 
ducted a simulation study for ko = 5, n = 60 and p = 600 with Xi generated 
from Model (5.1) with a multivariate normal distribution. The detailed sim- 
ulation setting will be provided in Section 5. Figure 1 presents box-plots of 
the modified statistics n 5 T n fe (left panel) and its first-order difference d^fl 
(right panel), with 5 = 0.5. We see from the right panel that the first five 

boxes are relatively large, and d °) is close to while for k>5. This indicates 
that five would be the bandwidth estimate. 

In practical implementations with finite samples, the bandwidth estima- 
tor may be sensitive to the tuning parameters 5 and 6. Note that, as revealed 
a few paragraphs earlier, d n k should be significantly larger than for k < ko 
and close to for k>k^. Such a pattern, as displayed in Figure 1, indicates 
that ko is a change point for {d n k}f£- - This motivates us to consider a re- 
gression change-point detection algorithm for bandwidth estimation. Con- 
sider d n j, the difference between successive statistics T n j, for j = 1, . . . , M, 
for a sufficiently large M that covers the true bandwidth ko- The idea is 
to fit, at each candidate k, a regression function gk{j) to {d n j}jL such 
that g(j) = g(k) for all j > k. We may fit a nonparametric, locally weighted 
linear regression [Cleveland and Devlin (1988); Fan and Gijbels (1996)] on 
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j G -Lfe = {1 : < I < k} to the left of with the smoothing window-width hk, 
where h is a smoothing parameter, and fit a flat line at the level d n k for 
j G Rk = {I : k + 1 < I < M} to the right of k. If k is too small for the above 
nonparametric regression, a parametric polynomial regression may be con- 
ducted. Let (jk{j) be the regression estimate, nonparametric or parametric, 
obtained over the set L^, and let 

err(fc) = ^ \h{j) ~ d n j \ + ^ \dnk - d nj \ 
j&L k jeB. k 

be the absolute deviation of the fitted errors. Then a bandwidth estimator, 
as we call the change-point estimator, is 

(4.5) k = argmin{err(A:) : 1 < k < M}. 

k 

Our empirical studies reported in Section 5 show this estimator worked quite 
well. 

Bickel and Levina (2008a, 2008b) proposed a method to select the band- 
width based on a repeated random splitting of the original sample to two 
sub-samples of sizes n\ and n 2 = n — n\. Let S| and Yj\ ' 3e the sample covari- 
ances based the sub-samples of sizes n\ and ri2 respectively, where v denotes 
the v th split, for v = 1, . . . , N, where N is the total numbers of sample split- 
ting. The risk for each candidate k is defined to be R(k) = E||Sfc(S) — EU^i), 
where for a p\ x p 2 matrix A = (ajj), ||j4||(i,i) = maxi<j< P2 Yli=i \ a ij\- An 
empirical version of the risk is 

1 N 

(4-6) A(fc) = -X;i|B*(Si)- ^11(1,1), 

v=l 

and the bandwidth estimator is 

^BL = argmin R{k). 

0<fc<p-l 

Bickel and Levina (2008a) recommended n\ to be n/3, and the number 
of random splits, iV = 50, while Bickel and Levina (2008b) suggested n\ = 
n(l — 1/logn) and using the Frobenius norm instead of the || • 11(1,1) norm. 
Rothman, Levina and Zhu (2010) considered a similar method to select 
the bandwidth in their estimator. We note that these approaches can be 
adversely impacted by high dimensionality, due to the fact that £2 may be 
a poor estimator of £ if p is much larger than n, as found in early works 
[Johnstone (2001); Bai and Silverstein (2005)]. 

5. Simulation results. In this section, we report results from simulation 
studies to verify the proposed test for the handedness and the bandwidth 
estimator. We evaluate the performance of the proposed test under several 
different structures of covariance matrix for normal and gamma random 
vectors. We generate p-dimensional independent and identical multivariate 
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random vectors Xi = (Xn, . . . , Xi p )' according to a model 

(5.1) Xij = ^2^iZ ij+ i, 

1=0 

where ko is the bandwidth of the covariance, 70 = 1 in all settings and the 
other coefficients 7/ will be specified shortly. Two distributions are assigned 
to the i.i.d. Zj,-: (i) the normal distribution iV(0,l); (ii) the standardized 
Gamma(l,0.5) distribution so that it has zero mean and unit variance. To 
mimic the "large p, small n" paradigm, we choose n = 20,40,60 and p = 
50, 100, 300, 600, respectively. 

We first evaluate the size of the proposed test under the null hypothesis 
-ffj^o : S = -Bfc(E) for k = (diagonal), 1,2 and 5. The coefficients 7/ for / > 
are: 71 = 1 and 0.5, respectively, for k = 1; 71 = 72 = 1, and 71 = 0.5 and 
72 = 0.25, respectively, for k = 2; and 71 = • • • = 75 = 0.4 for k = 5. To assess 
the power, we generate data according to (5.1) so that S = and test 

for H^-ifl ■ £ = Bk— 1(^) f° r k = 2 and 5, respectively, with the 7/ values 
being the same with those in the corresponding k in the simulation for the 
size reported above. We note that this design, having the bandwidth of 
the null hypothesis adjacent to the true bandwidth, is the hardest for the 
test, as the null and the alternative is the closest, given the setting of the 
parameters {7/}. All the simulation results are based on 1000 simulations. 

We also evaluate the test proposed in Cai and Jiang (2011), based on 
the asymptotic distribution of the coherence statistic L n under the same 
simulation settings used for the proposed test. The test encountered a very 
severe size distortion in that the real sizes are much less than the nominal 
level of 5%, which also caused the power of the test to be unfavorably low. 
For these reasons, we will not report the simulation results of the test. The 
coherence statistic is the largest Pearson correlation coefficients among all 
pairs of different components in X, and is an extreme value-type statistic. 
Extreme value statistics are known to be slowly converging, and a computing 
intensive method is needed to speed up its convergence. The asymptotic 
distribution established in Cai and Jiang (2011) may be the foundation to 
justify such a method. 

Table 1 reports the empirical sizes of the proposed test at the 5% nomi- 
nal significance for H^ q with k = 0, 1,2 and 5, respectively, under both the 
normal and gamma distributions. Table 2 summarizes the empirical power 
of the tests whose sizes are reported in Table 1. To understand the power 
results, Table 2 also contains the values of 1 — for each simulation setting. 
We observe from Table 1 that the test has reasonably empirical sizes, around 
5%, and that the test is not sensitive to the dimensionality indicated by its 
robust performance. There is some size inflation, which is due to a number 
of factors, mainly to the dimensionality p, the sample size n and the ap- 
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Table 1 

Empirical sizes of the proposed test at 5% significance for the normal and gamma 
random vectors generated according to model (5.1) 



Normal Gamma 



P P 



n 


50 


100 


300 


600 50 


100 


300 


600 


















oa 
21) 


a a^a 

u.uoy 


U.UDO 


a r\R i 
U.UOl 


U.UDO U.Uoo 


U.UOO 


U.UDO 


A A7£ 

U.Uro 


4U 


U.UO ( 


u.U4y 


U.U4 ( 


a a^a a nccfi 
U.UDU u.uoo 


a ak a 
U.UQ4 


A A£ K 
U.UOO 


A A£A 

u.uoy 


60 


0.066 


0.064 


0.045 


0.051 0.068 


0.039 


0.065 


0.049 










Jzlo ■ — -til(lj) 
















7i — i 








o a 
III 




0.061 


0.056 


a ntiA a Atio 
U.UDU U.UOz 


A A£ Q 
U.UOO 


a Atin 

u.uoy 


A AfiA 

u.uoy 


40 


0.061 


0.048 


0.048 


0.069 0.059 


0.049 


0.069 


0.075 


60 


0.045 


0.053 


0.056 


0.067 0.048 


0.061 


0.068 


0.059 










7i =0.5 








20 


0.065 


0.069 


0.058 


0.067 0.063 


0.061 


0.057 


0.061 


40 


0.063 


0.052 


0.047 


0.068 0.059 


0.055 


0.066 


0.071 


60 


0.050 


0.056 


0.057 


0.061 0.050 


0.070 


0.068 


0.060 










(c) # :£ = B 2 (E) 
















71 = 72 = 1 








20 


0.058 


0.050 


0.055 


0.058 0.056 


0.046 


0.062 


0.062 


40 


0.049 


0.042 


0.051 


0.058 0.059 


0.048 


0.076 


0.071 


60 


0.050 


0.043 


0.065 


0.064 0.040 


0.063 


0.065 


0.052 










7i =0.5, 72 =0.25 








20 


0.060 


0.055 


0.056 


0.061 0.059 


0.054 


0.062 


0.062 


40 


0.055 


0.047 


0.055 


0.059 0.058 


0.046 


0.071 


0.064 


60 


0.044 


0.043 


0.058 


0.060 0.042 


0.060 


0.067 


0.061 






(d) H : E = 


B 5 (S) with 71 = ••• = 75 


= 0.4 






20 


0.045 


0.058 


0.067 


0.059 0.050 


0.061 


0.054 


0.064 


40 


0.043 


0.054 


0.049 


0.061 0.041 


0.052 


0.065 


0.064 


60 


0.031 


0.046 


0.065 


0.069 0.034 


0.040 


0.053 


0.048 



proximation error of the finite sample distribution of the test statistic by 
the limiting normal distribution. We recall that the test statistic is a lin- 
ear combination of [/-statistics, whose convergence to the limiting normal 
distribution can be slow. In the simulations for power evaluation (reported 
in Table 2), we designed the simulation so that a constant was main- 
tained for a set of different ps, while n was held fixed. The empirical powers 
reported in Table 2 show that the power is quite reflective to the sample 
size n and 1 — r^, namely larger n or large 1 — leads to higher power. This 
is because as decreases, the signal of the test increases. So it becomes 
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Table 2 

Empirical power of the proposed test at a = 5% for the normal and gamma random 
vectors generated according to model (5.1) 



Normal Gamma 



P P 



n 




50 


100 


300 600 50 100 


300 


600 










a) H : E = Bi (E) when E = B 2 (E) 














7i =72 = 1, 1 —ri = 1/14 






20 





300 


0.313 


0.330 0.336 0.315 0.312 


0.340 


0.312 


40 





683 


0.722 


0.711 0.702 0.710 0.721 


0.752 


0.741 


60 





962 


0.964 


0.952 0.954 0.958 0.955 


0.950 


0.949 










7i = 0.5, 72 = 0.25, 1 - n = 1/35 






20 





146 


0.144 


0.139 0.152 0.148 0.140 


0.147 


0.143 


40 





269 


0.253 


0.258 0.279 0.256 0.281 


0.311 


0.311 


60 





406 


0.443 


0.455 0.451 0.438 0.449 


0.458 


0.441 








(b) 


H : E = B 4 (E) when E = S 5 (E) with 












7i = • • • = 7s = 0.4, 1 - r A = 1/38.05 






20 





090 


0.112 


0.119 0.123 0.096 0.112 


0.108 


0.118 


40 





149 


0.181 


0.178 0.200 0.161 0.169 


0.218 


0.196 


60 





261 


0.284 


0.328 0.314 0.246 0.297 


0.290 


0.284 



easier to distinguish the null hypothesis from the alternative. And after we 
controlled n and 1 — r^, the power was not sensitive to p at all, confirming 
a remark made at the end of Section 3. 

For bandwidth estimation, we generate {Xi}f =1 according to (5.1). While 
we keep 70 = 1, the other coefficients 7; for I > are: 

Bandwidth 3: ji = 1, for i = 1, 2, 3; 

Bandwidth 5: 7, = 0.4 for 1 < i < 5; 

Bandwidth 10: y t = 0.2 for 1 < i < 5 and % = 0.4 for 6 < % < 10; 

Bandwidth 15: 7.; = 0.2 for 1 < i < 10 and y t = 0.4 for 11 < i < 15. 

The covariances have bandwidth 3, 5, 10 and 15 respectively. We evaluate 
two bandwidth estimators. One is k^fi given in (4.4) with 5 = 0.5 and 9 = 
0.06, namely fco.5,o.06> and the other is the change-point estimator given 
in (4.5), applied on candidate ks whose p- values for ifofc are larger than 
10~ 10 . We employ the LOESS algorithm in R to carry our the nonparametric 
regression estimation to the left of a k, with a default smoothing parameter 
h = 0.75. 

For each S, we compare the proposed bandwidth estimators with the 
estimators advocated in Bickel and Levina (2008a, 2008b) and Rothman, 
Levina and Zhu (2010). We choose n to be 20, 40 and 60. For each n, p 
is chosen 2 times, 5 times and 10 times of n, respectively. Following the 
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Table 3 

Averaged empirical bias (standard deviation) of the five bandwidth estimators: 
estimator (4-4) with 5 = 0.5 and 9 = 0.06 (fixed), the change-point estimator (4-5) 
(change-point) with h = 0.75 and the estimators proposed in Bickel and Levina (2008a) 
(BLa), Bickel and Levina (2008b) (BLb) and Rothman, Levina and Zhu (RLZ) 



Bandwidth 



n p Method 3 5 10 15 



20 40 


Fixed 


0.58 (1.465) 


0.07 (0.946) 


-0.5 (1.114) 


-1.63 (1.931) 




Change-point 


0.60 (0.569) 


-0.21 (0.518) 


-1.48 (2.134) 


0.06 (1.734) 




BLa 


-0.66 (0.855) 


-0.86 (1.287) 


-4.72 (2.202) 


-9.19 (2.246) 




BLb 


0.59 (1.036) 


-0.53 (1.460) 


-3.97 (2.932) 


-6.63 (4.403) 




RLZ 


0.11 (1.363) 


-0.18 (1.855) 


-2.55 (2.732) 


-8.02 (2.760) 


100 


Fixed 


0.14 (0.636) 


0.1 (0.659) 


-0.22 (0.440) 


-0.96 (0.875) 




Change-point 


0.56 (0.499) 


-0.07 (0.293) 


-0.52 (0.882) 


0.18 (0.968) 




BLa 


-0.09 (1.272) 


0.45 (1.617) 


-2.33 (2.010) 


-6.14 (2.686) 




BLb 


0.7 (1.219) 


-0.26 (1.561) 


-3.88 (2.772) 


-7.29 (3.506) 




RT 7 


u.*±o 1 l.OU 1 j 


u.oy 1 1. ( jji 


q 7Q in <yr\X\ 


a a O i m\ 


200 


Fixed 


0.01 (0.1) 


0(0) 


-0.12 (0.327) 


-0.66 (0.728) 




Change-point 


0.67 (0.473) 


0(0) 


-0.18 (0.435) 


0.09 (0.379) 




BLa 


0.78 (2.077) 


1.14 (2.327) 


-0.58 (2.637) 


-2.55 (3.560) 




BLb 


1.18 (1.935) 


-0.1 (2.302) 


-2.91 (2.878) 


-6.14 (3.579) 




RLZ 


0.55 (1.641) 


-0.29 (1.719) 


-4.6 (1.928) 


-9.51 (1.823) 


40 80 


Fixed 


0.14 (0.551) 


0.08 (0.464) 


-0.01 (0.1) 


-0.10 (0.302) 




Change-point 


0.47 (0.502) 


-0.01 (0.100) 


-0.12 (0.383) 


0.08 (0.273) 




BLa 


-0.24 (0.780) 


0.23 (1.014) 


-1.32 (1.663) 


-3.55 (2.907) 




BLb 


1.5 (1.514) 


0.94 (1.427) 


0.06 (2.210) 


-0.17 (3.260) 




RLZ 


1.05 (1.629) 


0.71 (2.222) 


0.72 (2.374) 


1.28 (3.229) 


200 


Fixed 


0(0) 


0(0) 


0(0) 


-0.04 (0.197) 




Change-point 


0.55 (0.500) 


0(0) 


-0.04 (0.281) 


0.02 (0.141) 




BLa 


0.29 (1.200) 


1.03 (1.322) 


0.28 (1.633) 


-1.30 (2.285) 




BLb 


1.64 (1.605) 


1.24 (1.837) 


0.58 (2.833) 


-0.1 (2.976) 




RLZ 


1.36 (2.435) 


1.16 (2.465) 


2.07 (3.647) 


1.07 (2.861) 


400 


Fixed 


0(0) 


0(0) 


0(0) 


0(0) 




Change-point 


0.56 (0.499) 


0(0) 


0(0) 


0(0) 




BLa 


0.88 (1.754) 


1.5 (1.962) 


1.25 (2.240) 


0.22 (2.642) 




BLb 


2.61 (2.457) 


1.74 (2.493) 


0.68 (3.396) 


0.09 (3.715) 




RLZ 


2.19 (2.943) 


1.98 (3.369) 


1.17 (3.420) 


-0.39 (2.821) 



settings of Bickel and Levina (2008a, 2008b), ri\ is chosen to be n/3 and 
n(l — 1/logn), respectively, and the number of random splits in (4.6) is 
TV = 50. 

Table 3 reports the average empirical bias and standard deviation of the 
five bandwidth estimators based on 100 replications. We observe from Ta- 
ble 3 that the overall performance of the proposed estimators is better than 
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Table 3 
( Continued) 



Bandwidth 



n p 


Method 


3 


5 


10 


15 


60 120 


Fixed 


0.02 (0.141) 


0.08 (0.706) 


0.02 (0.2) 


-0.01 (0.1) 




Change-point 


0.52 (0.502) 


0(0) 


0(0) 


-0.01 (0.1) 




BLa 


0.22 (0.938) 


0.85 (0.989) 


0.14 (1.363) 


-0.88 (1.659) 




BLb 


1.71 (1.458) 


1.52 (1.541) 


1.67 (2.108) 


1.49 (2.615) 




RLZ 


1.24 (1.753) 


0.71 (1.431) 


2.03 (2.683) 


2.13 (2.845) 


300 


Fixed 


0(0) 


0(0) 


0(0) 


0(0) 




Change-point 


0.58 (0.496) 


0(0) 


0(0) 


0(0) 




BLa 


0.47 (1.439) 


1.56 (1.683) 


1.06 (2.136) 


0.70 (2.452) 




BLb 


2.15 (2.017) 


2.04 (2.474) 


1.73 (2.877) 


1.74 (2.922) 




RLZ 


1.68 (2.188) 


1.02 (2.383) 


2.45 (3.686) 


2.75 (3.331) 


600 


Fixed 


0(0) 


0(0) 


0(0) 


0(0) 




Change-point 


0.54 (0.501) 


0(0) 


0(0) 


0(0) 




BLa 


1.05 (1.702) 


1.92 (2.102) 


2.01 (2.393) 


1.06 (2.490) 




BLb 


3.16 (2.631) 


2.87 (2.699) 


2.97 (3.532) 


1.33 (3.254) 




RLZ 


3.3 (3.721) 


3.23 (3.787) 


3.82 (4.029) 


2.7 (3.506) 



those of Bickel and Levina (2008a, 2008b) and Rothman, Levina and Zhu 
(2010), with smaller standard deviation and bias. Moreover, as n is increased, 
both the bias and standard deviation of the proposed estimators decreased, 
and are quite robust to p, which is a nice property to have. For the estimators 
of Bickel and Levina (2008a, 2008b) and Rothman, Levina and Zhu (2010), 
the bias and the standard deviation could increase along with the increase 
of p, and are much larger than those of the proposed estimators. These are 
likely caused by the problems associated with the sample covariance matrix 
when the data dimension is high. 

6. Empirical study. In this section, we report an empirical study on 
a prostate cancer data set [Adam et al. (2003)] from protein mass spec- 
troscopy, which was aimed to distinguish the healthy people from the ones 
with the cancer by analyzing the constituents of the proteins in the blood. 
Adam et al. (2003) recorded for each blood serum sample i, the intensity 
for a large number of time-of-flight values tj. The time of flight is related to 
the mass over charge ratio m/z of the constituent proteins. They collected 
the intensity in the total of 48,538 m/z-sites and the full data set consisted 
of 157 healthy patients and 167 with cancer. 

Tibshirani et al. (2005) analyzed the data by the fused Lasso. They ig- 
nored m/z-ratios below 2000 to avoid chemical artifacts, and averaged the 
intensity recordings in consecutive blocks of 20. These gave rise to a total 




(c) First order differences - healthy group 




original 

-- fitted (> 85) 



50 85 100 127 150 200 

Bandwidth 



(d) First order differences - cancer group 




i r 



50 100 150 200 215 

Bandwidth 



Fig. 2. Test statistics, p-values and the first order differences d n k for the healthy and the 
cancer groups for bandwidths larger than 50. The p-values of the test for Hok for k < 50 
are too small to be considered for bandwidth estimation. 



of 2181 dimensions per observation. Levina, Rothman and Zhu (2008) esti- 
mated the inverse of the covariance matrix of the intensities by an adaptive 
banding approach with a nested Lasso penalty. They carried out additional 
averaging of the data of Tibshirani et al. (2005) in consecutive blocks of 10, 
resulting in a total of 218 dimensions. We considered the standardized data 
of Levina, Rothman and Zhu (2008), and tested for the banded structure of 
the covariance matrix of the intensities. 

The test statistics, p-values and the first order differences d n k for the 
healthy and cancer groups are displayed in Figure 2 for bandwidths k > 50. 
We do not display in the figure for bandwidths less than 50 since the values 
of the test statistics are too large, and the associated p-values for if fc are 
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too small for k < 50. These bandwidth estimates together with the shapes 
of the curves for the test statistics and the p- values in Figure 2 suggest that 
the covariance matrix of the healthy group is likely to be banded, while 
the covariance of the cancer group may not be banded at all, given the 
very large bandwidth and the shape of the curve. For the cancer group, as 
shown in Figure 2, the test statistics are relatively flat for 120 < k < 140, 
and then fall sharply afterward, which indicates relatively small values in 
the covariance matrix from sub-diagonal 120 to 140. However, there is a sub- 
stantial contribution from sub-diagonals for k > 140. These are echoed in the 
p-values displayed in panel (b) with almost stationary p-values within the 
above mentioned range, followed by a sharp increase. Panel (d) of Figure 2 
displays a rather unsettled curve for d n k, the difference between successive 
statistics T n ^. These are all in sharp contrasts to those of the healthy group, 
indicating rather different covariance structures between the two groups. 

At a = 5%, we reject a Hk$ when the statistic is larger than 3.29. For 
the healthy group, the smallest k such that H^ q is not rejected is k = 116, 
while for the cancer group is 191. We apply the bandwidth estimator (4.4) 
with 5 = 0.5 and 9 = 0.005. The estimated bandwidth for the health group 
is 121 and for the cancer group is 212. At the same time, the bandwidth 
estimates, by employing Bickel and Levina's (2008a) approach, are 144 for 
the healthy group and 193 for the cancer group. The one for the healthy 
group is much larger than the 121 we obtained earlier, using the estima- 
tor (4.4). We then apply the proposed regression change-point bandwidth 
estimator over a range of bandwidths whose associated p-values for test- 
ing are larger than 10~ 10 . For the healthy group, the bandwidth range 
is k > 85; for the cancer group the range is k > 150. We set the smoothing 
parameter h = 0.75 in the LOESS procedure in R. The regression band- 
width estimator is kh = 127 for the healthy group, which is slightly larger 
than the 121 obtained from the estimator (4.4). For the cancer group, the 
estimated bandwidth is 215. This rather large estimated bandwidth suggests 
that, compared to the healthy group, there is substantially more dependence 
among the protein mass spectroscopy measurements among the cancer pa- 
tients, and, in particular, the covariance may not be banded at all for this 
group of patients. 



APPENDIX 



We first introduce some notation. For q = 0, . . . ,p, define 





p-q * 



1=1 i,j,k 
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and 

, p-q * 

- B 3,g = p4^ Y/ X il X jl+q X kl X ml+q- 
n 1=1 i,j,k,m 

Then, V nk = Bi,q - 2B 2fi + £3,0 + 2Y% =1 (B 1 , q - 2B %q + B 3>q ), and W nk = 
2 ES+i(5i, 9 - 2B 2 , q + B 3 , q ). Let C nk = 2Y? q Zl +1 B hq and U t = B ifi + 
2 J2 P q =i B itq for i = 1, 2, 3. We first establish some lemmas for later use. 

Lemma 1. Under Assumptions 1 and 2, Var(C n fc) = i^ fe + o{n _2 tr 2 (£ 2 )}. 

PROOF. Since C nfc = (P 2 )" 1 E*j E|h-« 2 |>fc ^i^^i^; b Y the in- 
dependence between different observations, we have 

E(C7 nfc ) = (P 2 )" 1 ^ HX ih X il2 )E(X jh X jl2 )= Y, <W 

hi \h-h\>k \h-h\>k 

Note that 

ii.ii «2,j2 la|>fc |/3— U|>fc 

Let/; 

1I3&I2I4 ' 

a hU a h i s . Then, E(C 2 fc ) = (P 2 )~ 2 (L n i + L n2 + L n3 ), where 



Lnl-Pn Y Yl a hh a hln 



,2 ,2 

|/i-i2|>*i \h-U\>k 

L n2 = 4Pn Y Y (^fhhhk + <J hh <J hu[^\) (T hl2 (J hU 
\h— h\>k \h— U\>k 

and 

L n3 = 2P 2 Y Y (^fhhhU + ^hh^hhM) 2 • 
\h-h\>k \l 3 -U\>k 

We compute L n2 and L n3 part by part. First, note that 

Y Y fhhlsU <T hl3 a 'hh = tr ( A2 °A 2 )-2 Y Y fkhhU^hh^hU 
\h-h\>k \l 3 -k\>k \li-h\<kl3,h 

+ Y Y fhhhU a hh a hli- 
\h-h\<k\l 3 -U\<k 

By the Cauchy-Schwarz inequality, 



Y Yl fhhhu^h^hh 

\h-l 2 \<khM 



< tr 1 / 2 (r 2 )tr 1 / 2 [(sr o sr){(£r)' o (sr)'}] 
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and 

E E fhhhhWhh <(2fc + i) 2 tr{(ror)(r'or')(sos)}, 

\h-h\<k\h-l A \<k 
where T = (T o T)(T' o Y'). Note that 

tr(T) < tr(£ 2 ), tr{(r o T)(V' o o E)} < tr(£ 4 ) 

and 

tr[(sr o Er){(Er)' o (sr)'}] < tr(s 6 ). 

Since tr(E 6 ) < tr(E 2 ) tr(£ 4 ), k = o(p 1//4 ) and from Assumption 2, it follows 
that 

E E fhhhh^hh^hh = o{ tr2 ( s2 )} 

\h-l 2 \<kh,h 

and 

E E fhhhU a hh a hh =°{ tr2 ( s2 )}- 

|il-Z 8 |>*i |«3-k|>fc 

Similarly, it can be shown that 

E (S 2 ) 2 li2 =o{tr 2 (S 2 )}, 
|ii-i 2 |<fc 

E E °liA s =o{^ 2 )}, 

\h-h\<k\h-U\<k 

E (S 2 ) Wl (E 2 )y 2 =o{tr 2 (S 2 )}, 
\h-h\<k 

E E /f lW4 = ^tr 2 (E 2 )} 
|ii-ia|>fc \h-h\>k 

and 

E E ^ih^hu^hu^hh = °{ tr2 ( s2 )}- 

\h-h\<k\l3-U\<k 
By combining these together, 

Var(C nfc ) = 4n" 2 tr 2 (E 2 ) + 8n _1 E E a hh a hh a hh a hU 

\h-h\>k \h-U\>k 

+ 4AU- 1 E fhl2hk°hh°hh+o(n- 2 tr 2 (Z 2 )). 

\h— h\>k \l3~h\>k 

It can be checked that 

E E ^ih^hh^hh^hk = tr{E(E - -Bfc(S))} 2 

\h-l 2 \>k \h-h\>k 
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and 

E E fhhhhWhh = tr(r'(£ - B fc (E))r o r'(s - s fe (E))r). 

|Zi — z 2 1 >fc |J3-M>fc 
Therefore, Var(C nfc ) = i/ 2 fc + {n" 2 tr 2 (£ 2 )}. □ 

Lemma 2. Under Assumptions 1 and 2, for q = 0, . . . , k, 
Var(fi 2 , g ) =0{n" 2 tr 1 / 2 (S 4 )tr(S 2 )} and Var(£ 3i9 ) = 0{n" 4 tr(S 4 )}. 

Proof. First consider B^^- Since Ei^.g = for any q = 0, . . . , k, we only- 
need to calculate El? 2 ^. Note that we can decompose S| as 

/ 2 3 2 \ 

5 2, g = ( P n)~ 2 I E + E B ^lM + E - B 2'9' c » ) ' 

\z=l i=l i=l / 

where 

p-q * 

B 2,q, ai = E E ( X ih X il2)( X kh+q X kl2+q)( X jih X jih+q)( X j2l2 X j2l2+q)i 
h,l 2 =H,k,jl,j2 

p-q * 

B2,q,a 2 = E E (^l^^+9)(^Wl+9^*fe)(^ilil^ilil+3)(^Jal2-^j2J2+g)> 
h,l2=li,k,jl,j2 

p-q * 

B 2,q,b 1 = E y^ J ( X ih X il2 X ih+q)(. X jli X jh+q X jh) X kli+q X kl 2 +q, 
h,l2=H,j,k 

p-q * 

B2,q,b 2 = 2 E ^ (-^Qfi -Xjfa -^ife +g ) (-^jii X jh +q X jh +q) X kh +q X khi 
h,h=l i,j,k 

P-q * 

-82,9,63 = E /X X ih+q X il2 X il2+q)( X jh X jh+q X jl2+q) X kh X kl2j 
li,l 2 =l i,j,k 

p-q * 

B2,q, Cl = E y^ J ( X ih X il2)( X kh+q X kl2+q)( X jh X jh+q X jl2 X jl2+q) 
h,fo=l i,j,k 

and 

p-q * 

B2,q,c 2 = E y~]( X ih X ih+q)( X kli+q X kl2)( X jh X jh+q X jl2 X jh+q)- 
Zi,/2=1 i ,j,k 

We need to show that the expectations of all the terms above are controlled 
by the order n 4 tr 1 / 2 (E 4 )tr(S 2 ). First, note that E(B 2qai ) = 
l 2 =i <T hh (J h+ql2+q a hh+q a l2l2+q- % tne Cauchy-Schwarz inequality, 
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it can be shown that 

|E( J B 2 , g , ai )| = P n 4 0(tr 1 /2(s 4 )tr(S 2 )). 

Employing a similar derivation, we can show that the same result holds 
for all the other terms, which lead to the first part of Lemma 2. The second 
part can be proved following the same track. □ 

Lemma 3. Under Assumptions 1 and 2, Var(cVj) = o{n~ 2 tr 2 (S 2 )} for 
i = 2,3. 

Proof. The proof is similar to Lemma 2. □ 

Lemma 4. Under Assumptions 1 and 2, Var(££~£ +1 B^ q ) = o{n~ 2 tr 2 (S 2 )} 
for i = 2, 3. 

PROOF. Noting that YZ= k +i = u i - Yyq=i B i,qi tne lemma follows 
by applying Lemmas 2, 3, k = o(p 1 ^) and Assumption 2. □ 

In the following, we provide the proof of Propositions 1 and 2. 

Proof of Proposition 1. Rewrite W nk as 

v v 

W nk = C nk — 2 ^ B 2) q + ^2 B 3,q- 
q=k+l q=k+l 

Since E(C nk ) = £|i-i|>*4 = tr[{E - 5 fe (X)} 2 ] and E(B i<q ) = for i = 2,3 
and any q = 0, 1, . . . ,p — 1, the first statement is readily obtained. The sec- 
ond statement follows by applying Lemmas 1, 4 and the fact that f 2 fc > 
4n" 2 tr 2 (S 2 ). □ 

Proof of Proposition 2. It can be carried out following the same 
routes as those in Lemmas 1 and 2. Specifically, it can be shown that 
Var(F nfc ) = 0{a np tr 2 (S 2 )}. Hence, Var{F nfc /tr(S 2 )} = 0(a np ) 0. □ 

It is clear from the proof of Proposition 1 that W nk = C nk + o p {v nk ). 
Therefore, in order to derive the asymptotical distribution of the statistic, we 
only need to consider the asymptotical normality of C nk . Let J^o = 
and &t = ^{^1; Xt} for i = 1, 2, ... ,n, be a sequence of er-field generated 
by the data sequence. Let Ej(-) denote the conditional expectation with 
respect to & t . Write C nk -E(C nk ) = £" =1 D tk , where D tk = (E t -E t _i)C nk . 
Then for every n, D tk ,l <t < n, is a martingale difference sequence with 
respect to the er-fields {^j}^ . 

Lemma 5. Let <r 2 fc = Ef„i(D 2 fc ). Under Assumptions 1 and 2, osn-> oo ; 

(A.l) P% a k A 1 and E H E( ^ fc) ^0- 

V ' Var(C nfe ) Var 2 (C nfc ) 
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PROOF. We first establish the first part of (A.l). Noting that E(^]" =1 a 2 k ) = 
Var(C n fc), we need only to show Var(^" =1 a 2 k ) = o(Var 2 (C n fc)). Note that 

( t-i 



2 



n(n — 1) 



\h—h\>k I i=l 



+ ~( Y x th x th°hi2- Y a hh) 

\h~h\>k \h-h\>k J 



-h\>k 

Denote Q^l = Ztl( X ih X ih 

— cr ili2 ). Let Qt-i be the matrix with the 
(h,h)th entry being Q l ^_\ and M t _i =r'Q i _ir; then 

n 3 3 4 4 

t=l i=l i=l i=l i=l 

where 7 is a constant and 

4 n 

R ^ = —( n2E tr ( M *-i)' 

n z [n — l) z L — ' 

v > t=i 
8 n 

Rl2 = ~n 2 (n-l) 2 Y Y Qt-i(^Qt-iZ)hi 2 , 

V ' t=l \h-l 2 \<k 

4 n 

Rl3 = n 2 (n-l) 2 Y Y Y Qt-iQt-i^hh^hU, 
v ; t=l\l 1 -l 3 \<k\l 3 -h\<k 

4 n 

R21 = ^r, tto Vtr(M t _i o Mt-i), 

y ' t=l 



R 



—[viYY Y Qt-l M ™T r hm r i2m 
1 t=l m \h-lo\<k 



n 2 (n 

\h~h\<k 



R23 ~ n 2( n _ 1)2 Y,Y Y Y Qt-lQt^hm^hm^hm^hm, 



m \h-l 2 \<k\l3-k\<k 



R31 = —, Vtr(SQ f _!S 2 ), 

n^yn — 1) ^ 

8 ^ 

Rd2 = ~n 2 (n-l) Y Y Qt-l^hh, 
v ; t=i \h-l 2 \<k 
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R 



,2 fn 8 _ d Y Y CZQt-iX) k i 2 a h i 



t=i |«i-i 2 |<fc 



n 2 (n — l)^—' ^—f ^—f 

v ' i=l |; 1 _/ 2 |<fc |; 3 _m<fc 



fl 4i = -57 — Vtr(A/ f _! o ,4 2 ), 

n z (n — 1) 
y ' t=i 

8 n 

Ri2 = ~n 2 (n-l) YY Y Q\-l T hrrFl 2 m{A 
V ' t=l m \h-l 2 \<k 



2\ 

mm i 



R43 

and 



n 2 (n 



Y^YY Y CT W2 r /im r z 2 m M t-? 

> t=l m |i 1 _i 2 |<fc 



RM ~ n 2 (n-l) YY Y Y Q l t-\ a hlA V hm^hmThm^l A m- 

\ ' t=l m \h-l 2 \<k\l 3 -h\<k 



To prove Var(^™ =1 (7 2 fc ) = o(Var (C n fc)), we intend to prove the variance 
of each Rij is of small order of n _4 tr 4 (£ 2 ). 
For R12, denote for any 1 < i,j < n, 

Y if = Y ( X ih X ih ~ CF hh){(^' X j X j^')hl2 - ( s3 )/i/ 2 l- 
\h-h\<k 

Then Eih-bi^OKCSOf-iSJhh = ££W + Efc-^ 2 - Note that 
E^- 2 = for any i / j and E^^Y^J =0 for any (h,i2,ji,h), except 
{n = «2,il = J2} and {ii = jt,i 2 = Thus for any t < I, 

Covf ]T Q^EQ^E),^, Q l A(^Ql-i^)hh) 
\h-h\<k \h-h\<k 



= (t- l)Vtx(Ytf) + (t - l)(t - 2)Var(y 1 i 2 2 ). 

We only need to verify that Var(y{ 2 ) and Var(y{ 2 ) are of small orders of 
tr 4 (S 2 ). Note that 

E(Y 1 \ 2 ) 2 = E ^ (Xi^Xi/jj -cri 1 / 2 )(Z 1 / 3 Xm-cr/ 3 i 4 ) 

|h-/2|<fc|l3-/4|<fc 

x {(EX^E)^ - (E 3 )^} 
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<7l2 E E {^Iz+VhhVhhf^i^U+VhhVkh) 1 ^ 

\h-h\<k\h-U\<k 

x{(S 3 )^ 4 + (S 3 ) j3i3 (E 3 ) W4 } 1 / 2 

<7i2 E «h + °hh°hh) E {(S 3 )f i;2 + (S 3 ) yi (S 3 )y 2 } 
\h-h\<k \h-h\<k 

< 7l2 (2fc + l) 2 tr(£ 2 )tr(£ 6 ), 
where 712 is a constant. Since tr(£ 6 ) < tr 3 / 2 (E 4 ), 

(2k + l) 2 tr(£ 2 ) tr(£ 6 ) = 0{k 2 tr(£ 2 ) tr 3 / 2 (£ 4 )} 
= 0{A;V 3 / 2 tr 4 (S 2 )} 
= o{tr 4 (£ 2 )}, 

which indicates that Var(y i \ 2 ) = o{tr 4 (S 2 )}. Similarly, we can also show that 
Var(y i 1 2 2 ) = o{tr 4 (S 2 )}. Thus 

P.A ( n ^ 

Var(^ 12 )= Var £ Q^Qt-l^hlA 

v ' U=l |Zi-Z 2 |<fc J 

= o{n" 4 tr 4 (S 2 )}. 

Following the same procedure, we can prove that for all the other Rij, 
Vax(Rij) = o{n~ 4 tr 4 (S 2 )}. Since Var 2 (C nfc ) > n" 4 tr 4 (£ 2 ), we have 
Vax(Rij) = o{Var 2 (C n fc)}. Thus we have Var(^" =1 a 2 k ) = o(Var 2 (C n fc)), and 
hence the first part of (A.l). 

For the second part of (A.l), by simple algebra, we can rewrite D tk as 
D t k = S t i - S t 2 + S t 3 - Su, where 

Sti = {X&t-xXt - tr(g t _iS)}, 

St2 = , 2 .A X' t B k {Q t ^)X t - tr{fl fc (Q t _i)E}], 
n(n — 1) 

S t3 = -{X£X t -tr(Z 2 )} 
n 

and 

S u = -[XlB k (E)X t -tr{B fc (S)E}]. 
n 

Since Df k < j(Sfi + Sf 2 + Sf 3 + 5* 4 4 ), we have for a positive constant 7, 

n ( n n n n ^ 

£ Ep 4 fc ) < 7 E E ^ ) + E E (^) + E E ^ 4 s) + E E ^ ■ 

t=i U=i t=i t=i *=i J 
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In the following, we will prove the four terms on the right are of small orders 
of Var 2 (C n fc), respectively. To this end, note that 

E{X' t Q t ^X t - tr(Q t _!S)} 4 < ^{tv 2 (M 2 ^)} , 

where 71 is a positive constant. Since E{tr(M i 2 _ 1 )} = (t — l)0{tr 2 (£ 2 )}, 
and Var{tr(M 2 „ 1 )} = £ 2 0(tr 2 (£ 2 )tr(£ 4 )), then we have E{tr 2 (M 2 _ 1 )} = t 2 x 
0{tr 4 (S 2 )}. Thus, 

n 1 _ n 

^ E(5 » } = n *(n-l)^ E{X ' tQt ~ lXt ~ tr ^*- lE )> 4 
t=l ^ ' t=l 



< 16 



n 4 (n — 1) 



— 4 £ i 2 0{tr 4 (S 2 )} = -^0{tr 4 (S 2 )} = o{Var 2 (C„)}. 



Similarly, we can show that for i = 2, 3 and 4, YJl=i H S ti) = o{Var 2 (C n )}. 
Combining all the four parts together, we have Y^t=i ^(-^1 1) = °{Var 2 : (C)} , 
which leads to the second part of (A.l). □ 

Denote I nk = {W nk - E(W nk )}/V nk and J nk = E(W nk )/V nk . Then f nk = 
Ink + Jnk- For ko diverging, but satisfying (4.1), we intend to prove n s (J nk — 
Jnk+i) diverging to 00 uniformly on k < ko for any 5 > 0. And n s I nk uni- 
formly converges to in probability for any 5 < 1/2 and k < M, where 
M>k and M = o(p 1 ' 4 ). 

Lemma 6. Under Assumptions 1, 2 and (4-1), 2/liminf n {inffc < fc (r/ c _ ) _i — 
r k)} > and {o~u}^ =l are uniformly bounded away from and 00, for any 
5 < 0.5, as n — > 00: 

(a) P(n s (J nk - J nk+ i) > £ for any k < k ) -)■ 1 for any £ > 0; 

(b) P(n s \I nk \ < e, for any k < ko) — > 1 for any < e < 1; 

(c) P(n s \I nk \ < e, for any k$ < k < M) — >■ 1 for any < e < 1, where 
k < M and M = o{p 1 ^) . 

PROOF, (a) If {o~u}f =1 is bounded away from 00, similarly to the proof of 
Lemmas 1 and 2, it can be checked that Var(Kfc) = 0(/c 2 tr(X 2 )/n). There- 
fore, by Chebyshev's inequality, for any e > 0, 

Vnk — E(V n fc) 



tr(S 2 ) 



2 . Var(y nfc ) Ck 2 Ck 2 k% 



e 2 tr 2 (S 2 )r^ e 2 npr i k e 2 np ' 
where the last inequality comes from the fact that r^ 1 < 2ko + 1. Hence, 

Vnk ~ HVnk) 



P[ max 

V 0<k<k 



tr(S 2 )r 2 



<e) >1- V^^>1 



^ e A np e 2 np : 



k 

which converge to 1 since ko satisfies (4.1). Consider e < 1/2, and denote 
Q = {oj: \V nk - E(Kfc)| < er 2 tr(S 2 ), for any fc < k }. 
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By the above argument, P(fl) — > 1 as n — > oo. For any co € Q, we have 

1 - er k < 1/(1 + er k ) < tr[{B k (Z)} 2 ]/V nk < 1/(1 - er k ) < 1 + 2er k 

for any k < Icq. Hence, for any uj £ f2, 

n S {J n .k ~ Jnk+i) > n 5 (r k+1 - r k ) + n 5 (er k + 2er k+1 - 3e) 

> n 5 (r fe+ i - r fe ) - 3n 5 e, 

which implies that n s (J nk — J nk+ i) diverge uniformly on k < ko, by choos- 
ing e small enough. Therefore, for any £ > 0, by choosing e small enough, 
there exists a N > such that for any n> N , 

P{n 5 {J nk - Jnk+i) > £ for any k < k ) > P(U). 

The conclusion follows by noting that P(Q) — > 1 as n — > oo. The other two 
parts of the conclusion can be obtained similarly. For simplicity in the pre- 
sentation, we omit them here. □ 

Proof of Theorem 1. By Lemmas 1, 5 and the martingale central 
limit theorem [Billingsley (1995)], it is readily shown that as n — > oo, 

C nk -E(C nk ) o mi) 

Vnk 

Substituting C nk for W nk , Theorem 1 follows by noting W nk = C nk + o p (v nk ). 
□ 

Proof of Theorem 2. Note that Var{14fc/tr(£ 2 )} -)■ 0, E{V nk / 
tr(£ 2 )} = r k and limsup n rfc < 1. It can be shown that for any r\ > 0, 
linin^oo P(B n ^) = 1 where B njV = {V nk < (1 + ?7)tr(S 2 )}. This means that 
for any e > 0, there exists a positive integer N, such that for all n > N, 
P(B niV ) >l-s. Then from (3.4), 



Pnk > 



/ ^ rafc -tr(E 2 )+tr[{B fc (E)} 2 ] ^ V nk \ 
V *nk " a tr(S 2 ) ° nk ^ n *) 

. D ^ nfc -tr(E 2 )+tr[{i? fc (£)} 2 ] 

\ v nk 

> P n^m±wm > „ (1 + _ 5 „ t ) _ P(B; 

Therefore, from Theorem 1, 

r ■ f fl • f D [ ^nfc-tr(S 2 ) + tr[{i? fc (S)} 2 ] \ 
hmmf /3 n/c > hmmf P<^ i — - ^ ^!_L > Za (\ + 77) - 5 nk } 

n— >co n— >oo [ i> nk J 



limsupP(B^ ) 

n— too 



> 1 - <5>{z a (l + 7])- liminf 5 nk \ - e. 
The first part of the theorem follows by taking e — > and 77 — ^ 0. 
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(ii) The condition a n p^ 2 (l — r^) — > oo implies that <5 n & — > oo as n — > oo. 
Hence, /3 nfe ->• 1. □ 

Proof of Theorem 3. First consider the case where fco is bounded. 
Consider M to be a fixed sufficiently large integer. Recall that T n k = I nk + 
J nk , where 

Ink = {W nk -E(W nk )}/V nk and J nfc = E(W nk )/V nk . 

By (4.3), since a^p 2 = Ofo" 1 ), we have n s I n k = O p (n s a np 2 ) — >■ 0, for any 
k<M. Note that 

™ -r k+1 )=n >n(r k+1 -r k ). 

Thus, from (4.3), for k < ko, the condition liminf n (rfc+i — r k ) > implies 
that n s (J nk — J nk +i) ~ n s — > oo in probability, where 5 € (0, 1). Therefore, 

dnl — > oo for A; < fco and d^, = o p (l) for k>k$. Hence, for any 9 > 0, as 
n — )• oo, 

P (\ d nk I > -»• 1 for < k and PfldjJJ I > 9) -> for ft > fc . 

Therefore, for any > and any e > 0, for each A;, there exists a positive 
integer Nk such that for all n > iVfc, 

P(|4fe I < 0) < + 1) ^ any k < k 

and 

P(\d^l \>0)< e/(M + 1) for any k <k<M. 

Note that both ko and M are finite, we can set an N, which is larger than 
all N k such that the above are satisfied. Define, for k < M, B nk := {\cQ | < 9} 
and B n := (f) to 1 U (f)ti ko B ni ) for n > N. Then, for any w G B n , 

feo-l M 

p(^)<^p(s ni )+E p (^)^ e - 

Hence, for any < 5 < 1 and > 0, fe^g A ko- 

For the case of diverging ko, consider ko < M and M = o(p 1//4 ). For any 
9 > and S< 1/2, let e < 9/2 and f > 29. Denote 

E7i = {w : n s \I nk \ < e, for any fe < &o}, 

£/2 = {w : n s \I n k\ < e, for any ko < k < M} 

and 

f7 3 = {w:n 5 (J nA . - Jnfe+i) > £, for an Y & < M- 
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Then for any ui € Di=i we have n s (J nk — J n k+i) > (, > 29 for any k < k$ 
and n s \I n k\ < e < 0/2 for any fe < M, which lead to n 5 |/ n fc — I n k+i\ < # for 
any k < M. Therefore, 

d nk = ^(fnk ~ Ink+i) + n S {J nk - J n k+i) > for any k < k Q 

and 

|<£fc | < n 5 \I nk - I nk+1 1 < 6 for any k < k < M. 

From (4.4), we have fc^ — ko = 0. It follows that Di=i C {w : fc^g — ko = 0}. 
Since P(f]^ =1 Ui) — > 1 as n — >• oo by Lemma 6, we have kgfi ~ ~ ^ 0. □ 
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